A Tour through Geoconnex.us

Introduction

What is geoconnex.us? It is a framework for providers of water data to publish structured, linked metadata in a manner such that web crawlers can organize this metadata into a knowledge graph. This metadata should be formatted as JSON-LD. This knowledge grpah can be leveraged to create a wide array of information products to answer innumerable water-related questions. This vignette serves as a mockup of what navigating through the Geoconnex knowlege graph might look like on build-out.

Beginning our journey – community reference features

The geoconnex.us framework includes a large and growing catalog of “community reference features” – neutral internet representations of real-world locations and areas that can be sed by water data providers to describe what their published data is about. Many of these reference features are available from the website https://info.geoconnex.us, which is powered by PyGeoAPI, a python implementation of OGC API-Features. This API enables programmtic interaction with spatial features on the web.

What’s available from info.geoconnex.us? We have collated a variety of common hydrologic and administrative locations and boundaries, and will continue to add more.

collection_url <- "https://info.geoconnex.us/collections"
collections <- jsonlite::fromJSON(collection_url)

knitr::kable(select(collections$collections, title, description))
title description
HU02 Two-digit Hydrologic Regions
HU04 Four-digit Hydrologic Subregion
HU06 Six-digit Hydrologic Basins
HU08 Eight-digit Hydrologic Subbasins
HU10 Ten-digit Watersheds
National Aquifers National Aquifers of the United States
Reference Gages US Reference Stream Gage Monitoring Locations
States U.S. States
Counties U.S. Counties
American Indian/Alaska Native Areas/Hawaiian Home Lands (AIANNH) Native American Lands
Core-based statistical areas (CBSA) U.S. Metropolitan and Micropolitan Statistical Areas
Urban Areas Urbanized Areas and Urban Clusters (2010 Census)
Places U.S. legally incororated and Census designated places
Public Water Systems U.S. Public Water Systems

Let’s use New Mexico as our area of interest.

nm_url <- "https://info.geoconnex.us/collections/states/items?STUSPS=NM"

nm <- sf::read_sf(nm_url)

mapview::mapview(list(`New Mexico` = nm))

The search above gave us one state that we can retrieve by it’s ID. Below, we grab its JSON-LD format, and print two versions of what can be interpreted, the raw JSON-LD and the “flattened” form that interprets the fields according to published linked data vocabularies such as https://schema.org.

accept_jsonld <- httr::add_headers("Accept" = "application/ld+json")

nm_ld <- rawToChar(httr::GET(nm$id, config = accept_jsonld)$content)

prettify(nm_ld)
## {
##     "@context": [
##         {
##             "schema": "https://schema.org/",
##             "geojson": "https://purl.org/geojson/vocab#",
##             "Feature": "geojson:Feature",
##             "FeatureCollection": "geojson:FeatureCollection",
##             "Point": "geojson:Point",
##             "bbox": {
##                 "@container": "@list",
##                 "@id": "geojson:bbox"
##             },
##             "coordinates": {
##                 "@container": "@list",
##                 "@id": "geojson:coordinates"
##             },
##             "features": {
##                 "@container": "@set",
##                 "@id": "geojson:features"
##             },
##             "geometry": "geojson:geometry",
##             "id": "@id",
##             "properties": "geojson:properties",
##             "type": "@type"
##         },
##         {
##             "schema": "https://schema.org/",
##             "NAME": "schema:name",
##             "census_profile": {
##                 "@id": "schema:subjectOf",
##                 "@type": "@id"
##             }
##         }
##     ],
##     "type": "Feature",
##     "properties": {
##         "fid": 14,
##         "STATEFP": "35",
##         "STATENS": "00897535",
##         "AFFGEOID": "0400000US35",
##         "GEOID": "35",
##         "STUSPS": "NM",
##         "NAME": "New Mexico",
##         "LSAD": "00",
##         "uri": "https://geoconnex.us/ref/states/35",
##         "census_profile": "https://data.census.gov/cedsci/profile?g=0400000US35",
##         "id": "https://geoconnex.us/ref/states/35"
##     },
##     "id": "https://geoconnex.us/ref/states/35"
## }
## 
nm_ld <- jsonld::jsonld_flatten(nm_ld)

nm_ld
## [
##   {
##     "@id": "https://geoconnex.us/ref/states/35",
##     "@type": [
##       "https://purl.org/geojson/vocab#Feature"
##     ],
##     "https://purl.org/geojson/vocab#properties": [
##       {
##         "@id": "https://geoconnex.us/ref/states/35"
##       }
##     ],
##     "https://schema.org/name": [
##       {
##         "@value": "New Mexico"
##       }
##     ],
##     "https://schema.org/subjectOf": [
##       {
##         "@id": "https://data.census.gov/cedsci/profile?g=0400000US35"
##       }
##     ]
##   }
## ]
nm_ld <- fromJSON(nm_ld)

This gives us some basic information. The @id here is especially useful. Note that the @id (the subject of all the triples in the document) is the same as the id of the State GeoJSON we mapped above and used to retrieve this JSON-LD document.

Notice that we can get a name using a linked data property https://schema.org/name here. This is an example of structured data that would allow automated creation of information products that is an aim of geoconnex.us.

nm_feature <- sf::read_sf(nm_ld$`@id`)

nm_feature_name <- nm_ld$`https://schema.org/name`[[1]]$`@value`

# Could make html links clickable in mapview

nm_map_layer <- setNames(list(nm_feature), nm_feature_name)

mapview::mapview(nm_map_layer)

Now let’s pivot to look at some data that might be of particular interest. Let’s say I’m most interested in Las Vegas, New Mexico. Where is this city, and what is its boundary? Let’s search for this place in the U.S. Census Places feature collection within NM (FIPS Code 35)

## nm
NM_places <- sf::read_sf("https://info.geoconnex.us/collections/places/items?STATEFP=35&limit=10000000")

LasVegas <- filter(NM_places, NAME=="Las Vegas")

mapview::mapview(LasVegas)
print(LasVegas$uri)
## [1] "https://geoconnex.us/ref/places/3539940"

By printing the uri, we can see the Uniform Resource Identifier (URI) for the city of Las Vegas, NM. Now, what if I want to see which Public Water Systems (PWS) serve this city? Let’s see what Queryables the PWS reference features offer:

knitr::kable(fromJSON("https://info.geoconnex.us/collections/pws/queryables"))
queryable type
fid integer
geom geometry
PWSID text
NAME text
BOUNDARY_TYPE text
CITY_SERVED text
CITY_SERVED_uri text
ST text
ST_uri text
PROVIDER text
POPULATION_SERVED_COUNT real
SYSTEM_SIZE text
uri text
SDWIS text

We can query by CITY_SERVED_uri. Let’s try it.

LasVegasURI <- "https://geoconnex.us/ref/places/3539940"

LasVegas_PWS <- sf::read_sf(paste0("https://info.geoconnex.us/collections/pws/items?CITY_SERVED_uri=",LasVegasURI))

mapview::mapview(LasVegas_PWS, color="blue") + mapview::mapview(LasVegas, col.regions="green") 

Clicking around, we can find what seems to be the main water provider, PWSID NM3518025 (@id https://geoconnex.us/ref/pws/NM3518025). The embedded JSON-LD harvested includes a “subjectOf” link to the USEPA Safe Drinking Water Information System (SDWIS), indicating that this EPA page is about this particular water system. As geoconnex.us expands and developed, every community reference feature such as this one should have many links that can direct us to all kinds of metadata and data published by other organiztions that are relevant to the feature.

lv_pws_ld <- rawToChar(httr::GET("https://geoconnex.us/ref/pws/NM3518025", config = accept_jsonld)$content)

lv_pws_ld <- jsonld::jsonld_flatten(lv_pws_ld)

lv_pws_ld
## [
##   {
##     "@id": "https://geoconnex.us/ref/pws/NM3518025",
##     "@type": [
##       "https://purl.org/geojson/vocab#Feature"
##     ],
##     "https://purl.org/geojson/vocab#properties": [
##       {
##         "@id": "https://geoconnex.us/ref/pws/NM3518025"
##       }
##     ],
##     "https://schema.org/geoIntersects": [
##       {
##         "@id": "https://geoconnex.us/ref/places/3539940"
##       }
##     ],
##     "https://schema.org/geoWithin": [
##       {
##         "@id": "https://geoconnex.us/ref/states/35"
##       }
##     ],
##     "https://schema.org/isBasedOn": [
##       {
##         "@id": "https://catalog.newmexicowaterdata.org/en/dataset/public-water-supply-areas"
##       }
##     ],
##     "https://schema.org/name": [
##       {
##         "@value": "LAS VEGAS (CITY OF)\r\n"
##       }
##     ],
##     "https://schema.org/subjectOf": [
##       {
##         "@id": "https://enviro.epa.gov/enviro/sdw_report_v3.first_table?pws_id=NM3518025&state=NM&source=Surface%20water&population=18044"
##       }
##     ]
##   }
## ]

Beyond reference features - organizational data

Now let’s find some more of this other data published by other organizations, not just the reference features. First, let’s widen our scope a bit. Let’s say we’re interested in water data that is in the same HUC8 as Las Vegas.

hu08_url <- paste0("https://info.geoconnex.us/collections/hu08/items?bbox=",
                   paste(sf::st_bbox(nm_feature), collapse = ","))

hu08 <- sf::read_sf(hu08_url)

hu08 <- hu08[LasVegas,]

hu08$NAME
## [1] "Pecos Headwaters"
mapview::mapview(list("Pecos Headwaters" = hu08, "Las Vegas" = LasVegas),col.regions=c("green","blue"))

Just working with spatial intersections from the HUC8 reference feature collection, we find the relevant HUC8 is the Pecos Headwaters. Now, we can make use of all data within the Geoconnex.us system that has been published by all organizations. On build-out, we will be able to interact with a knowledge graph that is being continually updated by web crawlers that harvest JSON-LD from all participating provider websites. Parts of this knowledge graph, in turn, would be re-presented in the reference features JSON-LD. In this example, the JSON-LD from the page for the Pecos Headwaters (https://geoconnex.us/ref/hu08/13060001) might include links with the relationship label “geoContains” for every water data site about a location within that HUC8. For this exercise, we will load a pre-processed list of dataframes with the same information.

(Can take this further by making an actual graph, putting in a triple store, and interacting with SPARQL — may be more trouble than worth)

Layers can be turned on and off with the layers button on the top left for greater readability. That’s a lot of data though!

load("graph.rds")
names(within_hu08_13060001)
## [1] "HUC8"                     "PWS"                     
## [3] "NMED-SDWIS_Sample_Points" "NMED-NPDES"              
## [5] "NMBGMR-Wells"             "Reference Gages"         
## [7] "WaDE Sites"
mapview::mapview(within_hu08_13060001,
                 col.regions=c("green",
                               "blue",
                               "violet",
                               "red",
                               "orange",
                               "black",
                               "white"),
                 cex = c(3,
                         3,
                         9,
                         10,
                         10,
                         8,
                         4))

It looks like, in addition to the HUC8 and the PWS, we also have data from:

  • New Mexico Environment Department (NMED) of National Pollution Discharge Elimination System (NPDES) discharge locations
  • NMED drinking water sample test points
  • Wells monitored by the New Mexico Bureau of Geology and Mineral Resources
  • Streamgages currently or historically in the USGS National Water Information System (NWIS)
  • points of diversion with data managed by the Western States Water Council Water Data Exchange(WaDE)

A key part of the philosophy of geoconnex is that metadata is published independently by organizations and harvested automatically. Hovering around these various features on the map, we can that all sites have a uri (@id) that begins with “https://geoconnex.us/”. However, if one follows these links, one is taken to different websites hosted on different servers by different organizations that are not aware of each other. For example, https://geoconnex.us/nmwdi/nmbgmr/wells/WL-0183 redirects to http://wells.newmexicowaterdata.org/collections/nmbgmr_wells/items/WL-0183 , a PyGeoAPI instance operated by the New Mexico Water Data Initiative. Meanwhile https://geoconnex.us/wade/sites/NM_146344 redirects to https://wade-test.geoconnex.us/collections/WaDE/items/NM_146344, a separate PyGeoAPI instance operated by the Water Data Exchange. However, since both data systems provide JSON-LD markup, their metadata can be harvested automatically to create the data discovery workflow visualized here.

This allows us to browse through the metadata. For example, perhaps we are only interested the SDWIS points and the NPDES permit for the Las Vegas drinking water system:

npdes <- within_hu08_13060001$`NMED-NPDES`
lv <- filter(within_hu08_13060001$PWS,
             CITY_SERVED_uri == "https://geoconnex.us/ref/places/3539940")
lv <- st_as_sfc(st_bbox(lv))
npdes <- npdes[lv,]
lv <- filter(within_hu08_13060001$PWS,uri=="https://geoconnex.us/ref/pws/NM3518025")

mapview::mapview(list("Las Vegas NPDES Points"=npdes), col.regions="red") + mapview::mapview(list("Las Vegas PWS"=lv))

The true power of the system comes when organizations link data to the structured metadata they publish and that geoconnex.us can harvest. For example, the southernmost NPDES point (corresponding to the Las Vegas Wastewater Treatment Plant) includes linked data in the “sta” variable, referring to the OGC SensorThings API, at this URL: https://st.newmexicowaterdata.org/FROST-Server/v1.1/Things(2682)?$expand=Datastreams/Observations. Parsing the JSON response is simple. Thus, we can call the actual underlying data from NMED from the harvested metadata, learning that the Las Vegas Wastewater Treatment Plant has an NPDES permit that is effective as of May 01, 2017, and expires April 30, 2022.

data <- as.data.frame(fromJSON(npdes[2,]$sta)$Datastreams$Observations)
print('as.data.frame(fromJSON(npdes[2,]$sta)$Datastreams$Observations)')
## [1] "as.data.frame(fromJSON(npdes[2,]$sta)$Datastreams$Observations)"
knitr::kable(select(data,result,resultTime))
result resultTime
expiration 2022-04-30T00:00:00.000Z
effective 2017-05-01T00:00:00.000Z

Complementary Tools - the NLDI and API Clients

Let’s say we’re interested in conditions downstream of this WWTP. We can use the USGS Network-Linked Data Index to discover relevant data in that system. We can use the WWTP (@id https://geoconnex.us/nmwdi/nmbgmr/wells/NM0028827) as a starting point, and use the NLDI service to find the downstream mainstem.

geom_site <- sf::read_sf("https://geoconnex.us/nmwdi/nmbgmr/wells/NM0028827")
point <- sf::st_sfc(geom_site$geometry)
geom_site <- geom_site$geometry[[1]]
nldi_point<-nhdplusTools::discover_nhdplus_id(point)
# nldiURL<-"DM = "https://labs.waterdata.usgs.gov/api/nldi/linked-data/nwissite/USGS-08279500/navigate/DM?distance=100"

nldi_query <- URLencode(paste0('https://labs.waterdata.usgs.gov/api/nldi/linked-data/comid/position?f=json&coords=POINT(',geom_site[1],' ',geom_site[2],')'))
mainstem <- sf::read_sf(nldi_query)
mainstem <- sf::read_sf(paste0(mainstem$navigation,"/DM/flowlines?distance=250"))

mapview::mapview(list("Pecos Headwaters"=within_hu08_13060001$HUC8), col.regions="green") +
mapview::mapview(list("Las Vegas NPDES Points"=npdes), col.regions="red") + mapview::mapview(list("Las Vegas PWS"=lv), col.regions="black") +
mapview::mapview(list("Downtream Mainstem"=sf::st_geometry(mainstem)[within_hu08_13060001$HUC8,]), hcl.colors="blue")  
stream_gages_url <- paste0("https://info.geoconnex.us/collections/gages/items?limit=10000&bbox=",
                           paste(sf::st_bbox(hu08), collapse = ","))

stream_gages <- sf::st_intersection(sf::read_sf(stream_gages_url),
                                    hu08)


mapview::mapview(stream_gages) + mapview::mapview(hu08, colregion="green") 

Browsing around these sites a bit, let’s use the PECOS RIVER NEAR PUERTO DE LUNA, NM. Since the reference gages include network locations on the NHDPlusV2, we can use them with the Hydro Network Linked Data Index..

We can use the R package, nhdplusTools to interact with the NLDI. Below, we get the basin boundary, mainstem, and all sites in the Western States Water Council Water Data Exchange upstream of this stream gage.

site <- filter(stream_gages, name == "PECOS RIVER NEAR PUERTO DE LUNA, NM")

nldi_feature <- list(featureSource = "nwissite", 
                     featureID = paste0("USGS-", site$provider_id))

basin <- nhdplusTools::get_nldi_basin(nldi_feature)

mainstem <- nhdplusTools::navigate_nldi(nldi_feature, "UM", "flowlines", distance_km = 500)

wade <- nhdplusTools::navigate_nldi(nldi_feature, "UM", "wade", distance_km = 500)


mapview::mapview(wade, col.regions="white") + mapview::mapview(hu08, col.regions="green") + 
mapview::mapview(sf::st_geometry(mainstem), col.regions="blue")